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Uncovering mechanisms underlying the network formation is a long-standing challenge for data 
mining and network analysis. In particular, the microscopic organizing principles of directed net- 
works are less understood than those of undirected networks. This article proposes a hypothesis 
named potential theory, which assumes that every directed link corresponds to a decrease of a unit 
potential and subgraphs with definable potential values on all nodes are preferred. Combining the 
potential theory with the clustering and homophily mechanisms, we deduce that the Bi-fan consist- 
ing of 4 nodes and 4 directed links is the most favoured local structure in directed networks. Our 
hypothesis get positive supports from extensive experiments on 15 directed networks drawn from 
disparate fields, as indicated by the most accurate and robust performance of Bi-fan predictor within 
the link prediction framework. In summary, our main contributions are twofold: (i) We propose 
a new mechanism for the organization of directed networks; (ii) We design the corresponding link 
prediction algorithm, which can not only testify our hypothesis, but also find direct applications in 
missing link prediction and friendship recommendation. 
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I. INTRODUCTION 

Many social, biological and technological systems can 
be well described by networks, where nodes represent in- 
dividuals and links denote the relations or interactions 
between nodes. The study of structure and functions of 
networks has therefore become a common focus of many 
branches of science A big challenge attracting in- 
creasing attention in the recent decade is to uncover the 
mechanisms underlying the formation of networks 3. 
Macroscopic mechanisms include the rich-get-richer 3[, 
the good-get-richer Q , the stability constrains [Hi, and so 
on. Microscopic mechanisms include homophily |6|, clus- 
tering 0] , balance theory [1] , and so on. Mechanisms can 
also play a part in regulating the mesoscopic structure, 
like the formation and transformation of groups and com- 
munities [gl-flTI . Real networks are usually resulted from 
a hybrid of several mechanisms, for example, new nodes 
may form links according to the rich-get-richer mecha- 
nism, and simultaneously, new links among old nodes 
could be a consequence of the mechanism of clustering 

A number of systems are naturally described by di- 
rected networks: the world wide web is made up of di- 
rected hyperlinks, the food webs consist of directed links 
from predators to preys, and in the microblogging so- 
cial networks, fans form links pointing to their opinion 
leaders. The formation of directed links also obey some 
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general mechanisms, for example, users in Twitter are 
likely to form links to neighbors of their neighbors and 
to users with near ages, which are in accordance with the 
clustering and homophily mechanisms p^ . Reciprocity 
is a specific mechanism for directed networks [l4|. It is 
valid for many social networks, but inapplicable for some 
others like food webs. Compared with undirected net- 
works, link formation in directed networks receives less 
attention and thus has not been well understood. 

In this article, we propose a hypothesis on link forma- 
tion for general directed networks, named potential the- 
ory. Combining the potential theory and the clustering 
and homophily mechanisms, we could deduce a certain 
preferred subgraph. We apply the link prediction ap- 
proach [Tsl to verify our deduction. That is, we hide a 
few fraction of links and predict them by assuming that 
a link generating more preferred subgraphs is of higher 
probability to exist. Experiments on disparate directed 
networks ranging from large-scale social networks con- 
taining millions of individuals to small-scale food webs 
consisting of a hundred of species show that the predic- 
tion according to the preferred subgraph is remarkably 
more accurate and robust than prediction according to 
other comparable subgraphs. Besides the insights of the 
underlying mechanism for directed network formation, 
our work could find applications in friendship recommen- 
dation for social networks and missing link prediction for 
biological networks. 

This article is organized as follows. In Section 2, we 
will introduce some closely related works. Our perspec- 
tives and methods are presented in Section 3. The data 
description, experimental results and analyses are shown 
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in Section 4. Lastly, in Section 5, we summary the main 
finding and outline implications of this work. 



II. RELATED WORK 

As we will show in Section 3, the potential theory is 
locally applicable and thus we only introduce the previ- 
ous work on microscopic mechanism. In addition, we also 
introduce the studies on link prediction, emphasizing the 
applications of link prediction approaches on mechanism 
evaluation. 



A. Microscopic Mechanisms 

Clustering mechanism declares that two nodes have 
high probability to make a link between them if they 
share some common neighbors. This mechanism is indi- 
rectly supported by increasing evidences on high cluster- 
ing coefficients (the clustering coefficient of a node is de- 
fined as the density of links among its neighbors, and the 
clustering coefficient of the whole network is the average 
over all nodes [l^) of disparate networks Through 
investigation on a social network comprising 43,553 uni- 
versity members, Kossinets and Watts found direct 
evidence that two students share more common acquain- 
tances are more likely to become acquaintance to each 
other. The clustering mechanism also works for directed 
networks, for example, in Twitter, more than 90% of new 
links are added between nodes having at least one com- 
mon neighbor [isj . In addition, evolving network mod- 
els driven by common neighbors could reproduce some 
significant features of both real directed and undirected 
networks 

Homophily mechanism states the observed tendency of 
people to associate with others of similar profiles and/or 
experiences Q. Experiments on social networks thus 
far strongly support this mechanism. Positive evidences 
come from various channels, such as an acquaintance net- 
work of university members [l7j . a large-scale instant- 
messaging network containing 1.8 x 10^ individuals [20| . 
friendship networks of a set of American high schools 
plj . a social network of a cohort of college students in 
Facebook , and so on. A variety of characteristics are 
shown to be significant to the link formation, including 
race, tastes in music and movies, grade, age, location, 
language, sharing experience, etc. Homophily mecha- 
nism also plays a role in other kinds of networks, for 
example, in directed document networks, links (e.g., hy- 
perlinks between web pages and citations between arti- 
cles) tend to connect documents with high content sim- 
ilarities [2^. In some literature, the clustering mecha- 
nism is considered as a special case of homophily mech- 
anism, where two nodes having some common neighbors 
are recognized as being of similar network surroundings. 
We prefer to distinguish these two mechanisms. Recent 
experiments on directed social networks show that the 



clustering mechanism may be even stronger than the ho- 
mophily mechanism [23 |. 

Reciprocity mechanism is the tendency of nodes to re- 
sponse to incoming links by creating links back to the 
source It is a specific mechanism for directed net- 
works, but not applicable everywhere. For example, 
the reciprocity mechanism plays a significant role in the 
growth of social networks of Facebook [2^ and Flickr 
|26| , while it is of much less impacts on Slashdot [13] and 
it does not work at all for food webs [2811. 
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FIG. 1: Illustration of four example graphs. Graphs (b) and 
(d) are potential-definable, and the numbers labeled beside 
nodes are example potentials. Graphs (a) and (c) are not 
potential-definable, and if we set the top nodes' potential to 
be 1, some nodes' potentials cannot be determined according 
to the constrain that a directed link is always associated with 
a decrease of a unit potential. 



B. Link Prediction 

Link prediction is one of the most fundamental prob- 
lems in networked data mining that attempts to esti- 
mate the likelihood of the existence of a link between two 
nodes, according to observed links and the attributes of 
nodes [l^ [13 • It has found applications in predicting 
missing links of biological networks and recommending 
social mates in online social networks. To evaluate the 
algorithmic performance, we usually divide the data into 
two parts and use the training set to predict the testing 
set. The algorithmic accuracy is quantified by counting 
the overlap of prediction and the testing set and/or the 
ranks of links in the testing set among all non-observed 
links (i.e., node pairs not in the training set). 

Link prediction algorithm can be used in judging the 
driven mechanisms of network formation. The very 
nice performance of common-neighbor-based similarity 
indices strongly supports the validity of clustering mech- 
anism 30l,l3li 
et al. 



On several real friendship networks, Aiello 
32| showed that when combined with topological 
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features, topical similarity achieves a link prediction ac- 
curacy of about 92%. Via a link prediction approach, 
Wang et al. [s^ found that the mobility homophily and 
structural clustering are both significant for link forma- 
tion. Leskovec et al. [S^l testified the validity of so- 
cial balance theory through link prediction in signed net- 
works. 



III. METHODS 

This section will present our methods in details. 
Firstly, we will introduce the perspectives of potential 
theory, and then deduce the preferred subgraph by com- 
bining the potential theory together with clustering and 
homophily mechanisms. Lastly, we will build a bridge 
between the preferred subgraph and the link prediction 
problem, say how to testify our hypothesis by using a 
link prediction algorithm. 




subgraphs containing nodes {1,2} 




(b) (c) (d) 

FIG. 2: Considering subgraphs of (a) that contains nodes 
{1,2}, according to the traditional definition, (b) is the unique 
one, while in our definition, graphs (b), (c) and (d) are all 
subgraphs. Notice that, the empty graph containing nodes 1 
and 2 and no link is also a subgraph of (a) according to our 
definition. 



A. The Potential Theory 

A graph is called potential-definable if each node can 
be assigned a potential such that for every pair of nodes 
i and j, if there is a link from i to j , then i's potential is 
a unit higher than j. Clearly, a link is potential-definable 
yet a graph containing reciprocal links is not potential- 
definable. Figure 1 illustrates some example graphs with 
orders from 2 to 4, where graphs (a) and (c) are not 
potential-definable and graphs (b) and (d) are potential- 
definable. 



The potential theory claims that a link that can gen- 
erate more potential-definable subgraphs is more signifi- 
cant and thus of higher probability to appear. Our defi- 
nition of subgraph is more general than traditional one. 
Given a directed graph D(V^, E) with V and E the sets 
of nodes and directed links, according to the traditional 
definition, a graph W{V', E') is called a subgraph of D if 
V C V and E' contains all the links in E that connect 
two nodes in V' . Our definition only requires V' C V 
and E' C E, that is, E' is not necessary to include all 
links connecting nodes in V'. As shown in figure 2, (b), 
(c) and (d) are subgraphs of (a) according to our defini- 
tion, yet only (b) is a subgraph of (a) in the traditional 
definition. 



B. The Preferred Subgraph 

Since any graph containing reciprocal links is not 
potential-definable, here we do not take into account 
the reciprocity mechanism. The clustering mechanism 
prefers short loops (not necessary to be directed loops) 
and it only works for local surrounding, and thus we only 
consider loop-embedded subgraphs with orders 3 and 4. 
Two nodes connected by reciprocal links arc not treated 
as loops. To avoid the repeated count, we only consider 
the minimal loop-embedded subgraphs that do not con- 
tain loop-embedded subgraphs themselves. 

Figure 3 illustrates all the six different minimal loop- 
embedded subgraphs of orders 3 and 4. These subgraphs 
are named after Ref. [sHi yet our motivation is different 
from motif analysis and we adopt a different definition 
about subgraph. Among these six subgraphs, only Bi-fan 
and Bi-parallel are potential-definable. In a potential- 
definable subgraph, two nodes with the same potential 
cannot directly connect to each other and thus the ho- 
mophily mechanism only works when we consider the 
subgraphs as a whole. For Bi-fan the links are equiv- 
alent to each other and nodes are of two different po- 
tentials, while in Bi-parallel, links are different (two are 
from high-potential nodes to moderate-potential nodes, 
and the other two are from moderate-potential nodes to 
low-potential nodes) and nodes are of three different po- 
tentials. According to the assigned potentials, we could 
say the Bi-fan structure is more homogeneous than the 
Bi-parallel structure, and thus the homophily mechanism 
prefers the former one. 

In a word, taking into account the potential theory, 
together with the clustering and homophily mechanisms, 
we think that the Bi-fan subgraph is the most preferred 
one and thus a link that can generate more Bi-fan struc- 
ture should be of higher probability to exist. 

C. Link Prediction Algorithm 

Given a directed network D(y,i?), the fundamental 
task of a link prediction algorithm is to give a rank of 
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FIG. 3: All the six minimal loop-embedded subgraphs of orders 3 and 4. They are named after Ref. [35|], where 3-FFL 
and 4-FFL stand for three-order and four-order feed forward loops, and 3-Loop and 4-Loop mean three-order and four-order 
feedback loops, respectively. 



all non-observed links in the set U \ E, where U is the 
universal set containing all — 1) possible directed 

links. If one wants to find out missing links or recommend 
friendships, one can go for the links with the highest 
ranks. The mainstream method is to assign each non- 
observed link a score, and the one with higher score is 
ranked higher. 

We design the predictors corresponding to the six min- 
imal loop-embedded subgraphs shown in figure [31 By 
removing one link from every subgraph, we get twelve 
predictors as shown in figure 21 If we adopt the predictor 
Si , it means the score of a non-observed link u — >■ w is de- 
fined as the number of the ith subgraphs created by the 
addition of this link. Notice that, a link may generate 
ten 3-FFLs, but its roles can be different. For example, 
these ten 3-FFLs may include two 5*1, three 52 and five 
S3 . So if we adopt the predictor S2 , the score of this link 
is three. Therefore, if we would like to see the contribu- 
tion of a link to the created 3-FFLs, we can adopt the 
predictor 5*1 -I- 5*2 + ^3 , which means that the score of a 
non-observed link is defined as the sum of created , S2 
and ^3 by this link, equivalent to the number of created 
3-FFLs. Figure [5l illustrates a simple example about how 
we calculate the scores. 

Given a predictor we can rank all the non-observed 
links according to their scores. To evaluate the algo- 
rithmic performance, we randomly divide the observed 
links E into two parts: the training set E'^ is treated 
as known information, while the testing set (probe set) 
E^ is used for testing and no information therein is al- 
lowed to be used for prediction. Clearly, E = E^ U E^ 
and E"^ n E^ = (j). In our experiments, the training set 
always contains 90% of links, and the remaining 10% of 
links constitute the testing set. 

We use a standard metric, area under the receiver op- 
erating characteristic (ROC) curve [s^, to quantify the 
accuracy of link prediction algorithms. It is usually ab- 
breviated as AUC value. This metric can be interpreted 
as the probability that a randomly chosen missing link 
(a link in E^) is given a higher score than a randomly 
chosen nonexistent link (a link m U \E). In the imple- 
mentation, among n times of independent comparisons, 
if there are n' times the missing link having higher score 
and n" times the missing link and nonexistent link having 



the same score, we define the AUC value as [IE 

' 0.5n" 



AUC = 



n 



If all the scores are generated from an independent and 
identical distribution, the AUC value should be about 
0.5. Therefore, the degree to which the AUC value ex- 
ceeds 0.5 indicates how much better the algorithm per- 
forms than pure chance. 



IV. EXPERIMENTS 

This section first introduces the basic information 
about studied data, and then present the experimental 
results as well as what we can gain from these experi- 
ments. 



A. Data Description 

Our experiments include 15 real directed networks 
drawn from disparate fields. Details are as follows and 
the basic structural features are presented in Table HI If 
a network is unconnected, we only consider its largest 
weakly connected component. 

Biological networks. Three of them are food webs, 
representing the predator-pray relations, and another one 
is a neural network of C.elegans. 

• FWl [13] — A food web consists of 69 species living 
in Everglades Graminoids during wet season. 

• FW2 [111 — A food web consists of 97 species living 
in Mangrove Estuary during wet season. 

• FW3 [Hi — A food web consists of 128 species 
living in Florida Bay during dry season. 

• C.elegans [iOj — A neural network of the nematode 
worm C.elegans, in which an edge joins two neurons 
if they are connected by either a synapse or a gap 
junction. 

Information networks. We consider networks of 
documents where a directed link from i to j means the 
document i cites the document j, and a network of web- 
blogs where a directed link stands for a hyperlink. 
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FIG. 4: Illustration of the twelve predictors corresponding to the subgraphs shown in figure[3l The red dashed arrows represent 
the links removed from the original subgraphs. The relations are as follows: {Si, S2, S3} <^=^> 3-FFL, {S4} <^=> 3-Loop, {S5} 
<^ Bi-fan, {Se, S7} Bi-parallel, {Ss} 4-Loop, {Sg, Sio, Sn, S12} 4-FFL. 



• Small & Griffith and Descendants (SmaGri) [4i| — 
Citations to Small & Griffith and Descendants. 

• Kohonen [4l[ — Articles with topic "self-organizing 
maps" or references to "Kohonen T" . 



Scientometrics (SciMet) l4l| 
ing Scientometrics. 



■ Articles from or cit- 



• Pohtical Blogs (PB) [42| — A directed network of 
hyperlinks between weblogs on US political blogs. 

Social networks. All the following networks describe 
relationships between people. 

• Delicious [43l | — Delicious.com, previously known 
as del.icio.us, allows individuals to tag the book- 
marks and follow other users. The studied who- 
foUow-whom network was collected at May 2008. 

• Youtube [3] — YouTube offers the greatest plat- 
form where users can share videos with others. Ac- 
tive users who regularly upload videos maintain a 
channel pages. Other users can follow those users 
thus forming a social network. This data was col- 
lected at January 2007. 

• FriendFeed [i^ — FriendFeed is an aggregator 
that consolidates the updates from the social media 
and social networking websites, social bookmarking 
websites, blogs and micro-blogging updates, etc. 
Members can manage their social networking con- 
tents with one Friend-Feed account and follow oth- 
ers' updates. This dataset captures the who-foUow- 
whom relationships. 

• Epinions [1^ — Epinions.com is a who-trust-whom 
online social network of a general consumer review 
site. Members of this site can decide whether to 
"trust" each other. 



• Slashdot [47| — Slashdot.org is a technology- 
related news web site known for its specific user 
community. This site allows individuals to tag each 
other as friends or foes. 

• Wikivote [H, HI] — Wikipedia is a free encyclo- 
pedia written collaboratively by volunteers around 
the world. Active users can be nominated to be 
administrator. A public voting begins after some 
users are nominated. Other users can express their 
positive, negative or neural idea towards all the 
candidates. The most voted candidate will be pro- 
moted to admin status. This process implies a so- 
cial network in which users are nodes and the action 
of voting from someone to another demonstrates a 
directed link. This data is from English Wikipedia 
on 2794 elections. 

• Twitter [4^ — Twitter is an online social network- 
ing service where users can post texts within 140 
characters. It also allow users to "follow" other 
users whereby a user can see updates from the users 
he follows on his twitter page. 



B. Results 

Table 2 shows the predicting accuracies, measured by 
AUG values, of all the 12 individual predictors. In 14 
out of 15 real networks, except Youtube. the predictor 
S5 performs best among all the 12 predictors. The dif- 
ference between the AUG values of the predictor S5 and 
those of other predictors are usually remarkable, while 
for Youtube, the performance of S5 is very close to the 
optimal one, S12. The last row of Table 2 shows the aver- 
age AUG values, which again emphasizes the advantage 
of S5. Roughly speaking, the very simple rule — a link 
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TABLE I: The basic structural features of the studied 15 real networks. |V^| and \E\ are the number of nodes and hnks, k]"^^ 
and k'^ax sie the maximum of in-degree and out-degree of all nodes, and (fc) is the average degree of all nodes (average in-degree 
equals av erag e out-degree), (d) and C are the 90-percentiIe effective diameter [sO] and the clustering coefficient for directed 
networks \5M . 
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FIG. 5: Illustration of the scores of links according to our 
method. The red dash arrows are probe links. If we adopt 
the predictor Si, the scores for ni — >■ 713 and 714 — > n2 are 
5i(ni — >■ na) =2 (m — > ng — ^ 713 and ni — > 712 — >■ na) and 
51(714 — > 712) = 0, respectively. More examples are as follows: 

82(711 773) ► {711 ^> 772 -S— 7I3}; 85(774 —J- 772) ► {714 — 

7I5 7li ^ 7I2}; 86(714 712) ► {714 -> 775 -> 773 ^ n2}; 

89(774 — )■ 772) ► {774 ^ 7l5 — ^ 7li — ^ 772}; 



generates more Bi-fan subgraphs has higher probability 
to exist — is nearly 90% right. 

Table 3 compares the predicting accuracies of some 
hybrid predictors. We explain again that the predictor 
Si + S2 + S3 means that the score of a non-observed link 
is defined as the number of created Si, S2 and S3 resulted 
from the addition of this link. In fact, the six predictors 
in Table 3 correspond to the six minimal loop-embedded 
subgraphs in figure 3. Therefore, Table 3 directly com- 
pares the six candidate subgraphs: which one is the most 
preferred structure so that a link generating more such 
structures has higher probability to exist. Again, Bi-fan 
wins. 

Looking at the results presented in Table 2 and Table 3, 
a significant advantage of the Bi-fan structure is the high 
robustness, that is to say, even when the predictor S5 is 
not the best in some cases, its performance is very close 
to the optimal one. In contrast, for any other predictor. 



no matter an individual predictor or a hybrid one, it is 
very sensitive to the network structure, and occasionally 
give very bad predictions. 



V. CONCLUSION AND DISCUSSION 

This article studied the underlying mechanism on the 
link formation of directed networks. We presented a hy- 
pothesis named potential theory, which claims that a link 
that can generate more potential-definable subgraphs is 
of higher probability to appear. This mechanism cannot 
be solely used to infer network structure since there are 
too many potential-definable subgraphs (e.g., directed 
paths of any lengths are potential definable). Therefore, 
we also take into account two well-known local mecha- 
nisms: clustering and homophily. By combining these 
three mechanisms, we deduced that Bi-fan is the most 
preferred subgraph in directed networks. Via compar- 
ison of the link prediction accuracies of 12 individual 
predictors as well as six minimal loop-embedded sub- 
graphs, Bi-fan performs best: not only for its remarkably 
higher AUG value than others, but also for its robustness, 
namely for disparate testing networks, its performance is 
either the best or very close to the best. 

The local driven mechanisms underlying directed net- 
work formation are less understood compared with those 
for undirected networks. This kind of study is thus of 
theoretical significance, and our work provided insights 
into the microscopic architecture of directed networks. 
Although the potential theory is more complicated than 
the clustering and homophily mechanisms as well as the 
balance theory, its meaning is easy to be captured, that 
is, the potential-definable property implies a local hi- 
erarchy and the potential value of a node indicates its 
level in the hierarchical structure. For example, the di- 
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TABLE II: AUG values of the 12 predictors shown in figure 4. The best performance for each network is emphasized in bold. 
Each number is obtained by averaging over 50 implementations with independently random partitions of training set and testing 
set. 
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0, 


,6517 


0, 


,6905 


0, 


.4922 


0.8852 


0, 


,7108 


0. 


,7476 


0, 


.4851 


0, 


,6677 


0, 


,6242 


0, 


.5982 


0.5761 


Kohonen 


0. 


,6693 


0, 


,6124 


0, 


,6642 


0, 


.4991 


0.8605 


0, 


,6333 


0, 


,7335 


0, 


.4985 


0, 


,6148 


0, 


,5614 


0, 


.5778 


0.5946 


SciMet 


0. 


,6462 


0, 


,6192 


0, 


,6371 


0, 


.4980 


0.8371 


0, 


,6672 


0, 


,7045 


0, 


.4968 


0, 


,5977 


0, 


,5794 


0, 


.5753 


0.5895 


PB 


0. 


,9025 


0, 


,8181 


0, 


,8243 


0, 


.6948 


0.9595 


0, 


,8659 


0, 


,8679 


0, 


.7518 


0, 


,9479 


0, 


,8349 


0, 


,7616 


0.8584 


Delecious 


0. 


,7298 


0, 


,7077 


0, 


,7192 


0, 


.6577 


0.7839 


0, 


,7141 


0, 


,7344 


0, 


.6739 


0, 


,7378 


0, 


,7081 


0, 


.7046 


0.7273 


Youtube 


0. 


,7518 


0, 


,7453 


0, 


,7522 


0, 


.7456 


0.8517 


0, 


,8422 


0, 


,8576 


0, 


.8442 


0, 


,8505 


0, 


,8430 


0, 


.8507 


0.8624 


FriendFeed 


0. 


,8801 


0, 


,7503 


0, 


,7382 


0, 


.5895 


0.9766 


0, 


,7863 


0, 


,8100 


0, 


.7150 


0, 


,9690 


0, 


,8324 


0, 


.7318 


0.8027 


Epinions 


0. 


,8273 


0, 


,8326 


0, 


,8081 


0, 


.7460 


0.9101 


0, 


,8969 


0, 


,8843 


0, 


.8584 


0, 


,8995 


0, 


,8956 


0, 


.8804 


0.8831 


Slashdot 


0. 


,7164 


0, 


,7133 


0, 


,7124 


0, 


.7072 


0.9035 


0, 


,8984 


0, 


,8982 


0, 


.8925 


0, 


,9009 


0, 


,8982 


0, 


.8926 


0.8985 


Wikivote 


0. 


,9073 


0, 


,7448 


0, 


,7470 


0, 


.5962 


0.9699 


0, 


,7679 


0, 


,7451 


0, 


.6209 


0, 


,9583 


0, 


,7562 


0, 


.6096 


0.7468 


Twitter 


0. 


,5211 


0, 


,5033 


0, 


,5182 


0, 


.5002 


0.7498 


0, 


,5033 


0, 


,5441 


0, 


.5002 


0, 


,5549 


0, 


,5037 


0, 


.5039 


0.5423 


Average 


0.7523 


0.6641 


0.6925 


0.5776 


0.8846 


0.7396 


0.7778 


0.6172 


0.7756 


0.6595 


0.6253 


0.6940 



TABLE III: AUG values of the six subgraphs shown in figure 3. The best performance for each network is emphasized in 
bold. Each number is obtained by averaging over 50 implementations with independently random partitions of training set and 
testing set. 



Datasets 


Si + S2 + S3 




S4 


S5 


Se 


; +St 




Ss 


Sg + Sio + Sii + S12 


FWl 


0.6953 


0, 


.4903 


0.9066 


0, 


,8462 


0, 


.4172 


0.4653 


FW2 


0.7241 


0, 


.4809 


0.8964 


0, 


,8490 


0, 


.4972 


0.4674 


FW3 


0.6649 


0, 


.3997 


0.9105 


0, 


,8586 


0, 


.4303 


0.3283 


G.elegans 


0.8666 


0, 


.5671 


0.8679 


0, 


,8403 


0, 


.5755 


0.7736 


SmaGri 


0.8400 


0, 


.4922 


0.8852 


0, 


,8154 


0, 


.4851 


0.7291 


Kohonen 


0.8091 


0, 


,4991 


0.8605 


0, 


,7779 


0, 


.4985 


0.7039 


SciMet 


0.7874 


0, 


.4980 


0.8371 


0, 


,7872 


0, 


.4968 


0.7187 


PB 


0.9275 


0, 


.6948 


0.9595 


0, 


,9029 


0, 


.7518 


0.9122 


Delecious 


0.7621 


0, 


.6577 


0.7839 


0, 


,7743 


0, 


.6739 


0.7893 


Youtube 


0.7526 


0, 


,7456 


0.8517 


0, 


,8593 


0, 


.8442 


0.8625 


FriendFeed 


0.7937 


0, 


.5895 


0.9766 


0, 


,9151 


0, 


.7150 


0.9240 


Epinions 


0.8682 


0, 


.7460 


0.9101 


0, 


,9131 


0, 


.8584 


0.9174 


Slashdot 


0.7422 


0, 


.7072 


0.9035 


0, 


,9048 


0, 


.8925 


0.9083 


Wikivote 


0.9330 


0, 


,5962 


0.9699 


0, 


,8607 


0, 


.6209 


0.9288 


Twitter 


0.5281 


0, 


,5002 


0.7498 


0, 


,5475 


0, 


.5002 


0.5733 


Average 


0.7797 


0.5776 


0.8846 


0.8302 


0.6172 


0.7335 



rected loops are not hierarchy-embedded and the directed 
path is of a strictly hierarchical organization, the for- 
mer is not potential-definable and the later is potential- 
definable. The hierarchical organization is a well-known 
macroscopic feature for many undirected [s^, ^-nd di- 
rected |54{ networks, our work indicates that for directed 
networks, nodes tend to locally self-organized in a hier- 
archical manner. We guess this kind of microscopic hier- 
archical organization will contribute to the macroscopic 
hierarchical structure. In the near future, we will study 
more data sets in a more detailed way to check whether 
the potential theory and our hypothesis on hierarchical 
organization are valid, and to see the applicable range 
(to which networks it works and to what extent it can 
explain the network formation) of the potential theory. 

Lastly, we would like to say again that the link pre- 



diction problem is a very fundamental problem for both 
information filtering and network analysis [isl . [ssj , and 
it could find countless applications. In this work, we 
applied the link prediction approach to evaluate driven 
mechanisms of network formation, at the same time, 
our method can be directly applied in predicting miss- 
ing links and recommending friendships for large-scale 
directed networks, since the accuracy of our method is 
much higher than the common-neighbor-based methods 
as indicated by the performance of predictors Si, 6*2, S3 
and 5*4. 
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